<!DOCTYPE html>
<html lang="zh">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title>cs229 第三节 | Dirac Lee</title><meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noodp" />
<meta name="Description" content="cs229 第三节"><link rel="prev" href="https://diraclee.gitee.io/2020/07/cs229_lecture2/" /><link rel="next" href="https://diraclee.gitee.io/2020/07/cs229_lecture4/" /><link rel="canonical" href="https://diraclee.gitee.io/2020/07/cs229_lecture3/" />
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
<link rel="manifest" href="/site.webmanifest">
<link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="theme-color" content="#ffffff"><meta property="og:title" content="cs229 第三节" />
<meta property="og:description" content="cs229 第三节" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://diraclee.gitee.io/2020/07/cs229_lecture3/" />
<meta property="article:published_time" content="2020-07-19T00:00:00+00:00" />
<meta property="article:modified_time" content="2020-07-20T00:00:00+00:00" />
<script type="application/ld+json">
    {
        "@context": "http://schema.org",
        "@type": "BlogPosting",
        "headline": "cs229 第三节",
        "mainEntityOfPage": {
            "@type": "WebPage",
            "@id": "https:\/\/diraclee.gitee.io\/2020\/07\/cs229_lecture3\/"
        },"image": {
                "@type": "ImageObject",
                "url": "https:\/\/diraclee.gitee.io\/logo.png",
                "width":  800 ,
                "height":  600 
            },"genre": "posts","keywords": "机器学习, cs229","wordcount":  1780 ,
        "url": "https:\/\/diraclee.gitee.io\/2020\/07\/cs229_lecture3\/","datePublished": "2020-07-19T00:00:00\x2b00:00","dateModified": "2020-07-20T00:00:00\x2b00:00","license": "This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.","publisher": {
                "@type": "Organization",
                "name": "xxxx",
                "logo": {
                "@type": "ImageObject",
                "url": "https:\/\/diraclee.gitee.io\/logo.png",
                "width":  127 ,
                "height":  40 
                }
            },"description": "cs229 第三节"
    }
    </script><link rel="stylesheet" href="/css/style.min.css"><link rel="stylesheet" href="/css/lib/fontawesome-free/all.min.css"><link rel="stylesheet" href="/css/lib/animate/animate.min.css"></head>
    <body><script>
            window.isDark = (window.localStorage && window.localStorage.getItem('theme')) === 'dark';
            window.isDark && document.body.classList.add('dark-theme');
        </script><div class="wrapper"><nav class="navbar">
    <div class="navbar-container">
        <div class="navbar-header animated bounceIn">
            <a href="https://diraclee.gitee.io">Dirac Lee</a>
        </div>
        <div class="navbar-menu"><a class="menu-item" href="https://diraclee.gitee.io/posts" title="">文章</a><a class="menu-item" href="https://diraclee.gitee.io/tags" title="">标签</a><a class="menu-item" href="https://diraclee.gitee.io/categories" title="">分类</a><a class="menu-item" href="https://diraclee.gitee.io/about" title="">关于</a><a class="menu-item" href="https://diraclee.gitee.io" title="English"><i class="fas fa-language fa-fw"></i></a><a href="javascript:void(0);" class="theme-switch"><i class="fas fa-adjust fa-rotate-180 fa-fw" title="切换主题"></i></a>
        </div>
    </div>
</nav><nav class="navbar-mobile">
    <div class="navbar-container">
        <div class="navbar-header">
            <div class="navbar-header-title animated bounceIn">
                <a href="https://diraclee.gitee.io">Dirac Lee</a>
            </div>
            <div class="menu-toggle" id="menu-toggle">
                <span></span><span></span><span></span>
            </div>
        </div>
        <div class="navbar-menu" id="mobile-menu"><a class="menu-item" href="https://diraclee.gitee.io/posts" title="">文章</a><a class="menu-item" href="https://diraclee.gitee.io/tags" title="">标签</a><a class="menu-item" href="https://diraclee.gitee.io/categories" title="">分类</a><a class="menu-item" href="https://diraclee.gitee.io/about" title="">关于</a><a class="menu-item" href="https://diraclee.gitee.io" title="English"></a><a href="javascript:void(0);" class="theme-switch"><i class="fas fa-adjust fa-rotate-180 fa-fw" title="切换主题"></i></a>
        </div>
    </div>
</nav><main class="main">
                <div class="container"><article class="page"><h1 class="post-title animated flipInX">cs229 第三节</h1><div class="post-meta">
            <div class="post-meta-main"><a class="author" href="https://diraclee.gitee.io" rel="author" target="_blank">
                    <i class="fas fa-user-circle fa-fw"></i>Dirac Lee
                </a>&nbsp;<span class="post-category">收录于&nbsp;<i class="far fa-folder fa-fw"></i><a href="https://diraclee.gitee.io/categories/%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/">学习笔记</a>&nbsp;</span></div>
            <div class="post-meta-other"><i class="far fa-calendar-alt fa-fw"></i><time datetime=2020-07-19>2020-07-19</time>&nbsp;
                <i class="fas fa-pencil-alt fa-fw"></i>约 1780 字&nbsp;
                <i class="far fa-clock fa-fw"></i>预计阅读 4 分钟&nbsp;</div>
        </div><div class="post-content"><ul>
<li>线性回归回顾</li>
<li>局部加权回归</li>
<li>线性回归概率解释</li>
<li>逻辑回归</li>
<li>牛顿法</li>
</ul>
<a class="post-dummy-target" id="线性回归回顾"></a><h2>线性回归回顾</h2>
<p>第 i 个样本表示为 $(x^{(i)}, y^{(i)})$，其中 $x^{(i)} \in \R^{n+1}$ 因为规定 $x_0 = 1$，$y^{(i)} \in \R$</p>
<p>用 $m$ 表示样本数量，用 $n$ 来表示特征维数</p>
<p>预测函数 $\begin{aligned} h_\theta (x) = \sum_{j=0}^n \theta_j x_j = \theta^T X \end{aligned}$</p>
<p>损失函数 $\begin{aligned} J(\theta) = \frac 1 2 \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2 \end{aligned}$</p>
<p>已知房屋大小，预测房屋价格，对于预测函数，我们有但不限于以下几种选择</p>
<p>$h_\theta (x) = \theta_0 + \theta_1 x$</p>
<p>$h_\theta (x) = \theta_0 + \theta_1 x + \theta_2 x^2$</p>
<p>$h_\theta (x) = \theta_0 + \theta_1 x + \theta_2 \sqrt x + \theta_3 \log{(x)}$</p>
<p>后面两种看似不是线性函数，实际上也可以看作是线性回归</p>
<p>第二种可看作是 $h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$，其中 $x_1 = x$，$x_2 = x^2$</p>
<p>第三种可看作是 $h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3$，其中 $x_1 = x$，$x_2 = \sqrt x$，$x_3 = \log{(x)}$</p>
<a class="post-dummy-target" id="局部加权回归"></a><h2>局部加权回归</h2>
<a class="post-dummy-target" id="线性回归"></a><h3>线性回归</h3>
<p>调整 $\theta$ 使得 $\begin{aligned} \frac 1 2 \sum_{i=1}^m (\theta^T X - y^{(i)})^2 \end{aligned}$ 最小，最终返回 $\theta^T X$</p>
<a class="post-dummy-target" id="局部加权回归-1"></a><h3>局部加权回归</h3>
<p>调整 $\theta$ 使得 $\begin{aligned} \sum_{i=1}^m w^{(i)} (\theta^T X - y^{(i)})^2 \end{aligned}$ 最小，其中 $w^{(i)}$ 称为权重函数</p>
<p>例如取 $w^{(i)} = \exp{(- \frac {(x^{(i)} - x)^2} {2 \tau^2} )}$ ，其中 $\tau$ 称为带宽，$x$ 为预测阶段输入的特征</p>
<a class="post-dummy-target" id="适用性"></a><h3>适用性</h3>
<p>当特征维数很小，但样本数量很大时，适合使用局部加权回归</p>
<a class="post-dummy-target" id="线性回归概率解释"></a><h2>线性回归概率解释</h2>
<a class="post-dummy-target" id="假设"></a><h3>假设</h3>
<p>根据线性回归模型假设 $y^{(i)} = \theta^T x^{(i)} + \epsilon^{(i)}$，其中 $\epsilon^{(i)}$ 为非模型误差</p>
<p>根据中心极限定理假设 $\epsilon^{(i)} \sim \mathcal N(0, \sigma^2)$</p>
<p>即 $\begin{aligned} \mathbf P(\epsilon^{(i)}) = \frac 1 {\sqrt {2 \pi} \sigma} \exp{(- \frac {(\epsilon^{(i)})^2)} {2 \sigma^2} }  \end{aligned}$</p>
<p>并假设各个 $\epsilon^{(i)}$ 独立同分布 (IID)</p>
<a class="post-dummy-target" id="推理"></a><h3>推理</h3>
<p>这就意味着 $y^{(i)} | x^{(i)}; \theta \sim \mathcal N(\theta^T x^{(i)}, \sigma^2)$，即</p>
<p>$$
\begin{aligned} 
\mathbf P(y^{(i)} | x^{(i)}; \theta) = \frac 1 {\sqrt {2 \pi} \sigma} \exp{(- \frac {(y^{(i)} - \theta^T x^{(i)})^2)} {2 \sigma^2} }<br>
\end{aligned}
$$</p>
<p>且各个 $y^{(i)} | x^{(i)}; \theta$ 独立同分布</p>
<p>那么
参数 $\theta$ 的可能性</p>
<p>$$
\begin{aligned} 
\mathscr L(\theta) 
&amp;= \mathbf P(y | x; \theta) \\ 
&amp;= \prod_{i=1}^m \mathbf P(y^{(i)} | x^{(i)}; \theta) \\ 
&amp;= \prod_{i=1}^m \frac 1 {\sqrt {2 \pi} \sigma} \exp{(- \frac {(y^{(i)} - \theta^T x^{(i)})^2)} {2 \sigma^2} }
\end{aligned}
$$</p>
<blockquote>
<p>可能性与概率的区别</p>
<ul>
<li>某组参数有效的可能性</li>
<li>某个数据出现的概率</li>
</ul>
</blockquote>
<a class="post-dummy-target" id="对数可能性"></a><h3>对数可能性</h3>
<p>$$
\begin{aligned} 
l(\theta) 
&amp;= \log {\mathscr L(\theta)} = \log {\prod_{i=1}^m \frac 1 {\sqrt {2 \pi} \sigma} \exp{(- \frac {(y^{(i)} - \theta^T x^{(i)})^2)} {2 \sigma^2} }} \\ 
&amp;= \sum_{i=1}^m [\log { \frac 1 {\sqrt {2 \pi} \sigma} } + \log \exp {( &hellip; )}] \\ 
&amp;= m \log { \frac 1 {\sqrt {2 \pi} \sigma} } + \sum_{i=1}^m( - \frac {(y^{(i)} - \theta^T x^{(i)})^2} {2 \sigma^2} )
\end{aligned}
$$</p>
<a class="post-dummy-target" id="最大可能性评估-mle"></a><h3>最大可能性评估 MLE</h3>
<p>选择合适的 $\theta$ 来使得 $\mathscr L(\theta)$ 最大，也就是选择合适的 $\theta$ 来使得 $l(\theta) $ 最大</p>
<p>由于 $\sigma$ 与模型无关，因此也就是选择合适的 $\theta$ 来使得 $\begin{aligned}  \sum_{i=1}^m {(y^{(i)} - \theta^T x^{(i)})^2}   \end{aligned} = J(\theta)$ 最小</p>
<p>因此之前所使用的线性回归损失函数是正确的</p>
<a class="post-dummy-target" id="逻辑回归"></a><h2>逻辑回归</h2>
<a class="post-dummy-target" id="二分类问题"></a><h3>二分类问题</h3>
<p>$y \in \lbrace 0, 1 \rbrace$ ，试图用线性回归解决如下(低于0.5认为是负例，高于0.5认为是正例):</p>
<p><figure><img src="/svg/loading.min.svg" data-sizes="auto" data-src="https://gitee.com/DiracLee/picbed/raw/master/img/20200720102648.png" alt="" class="lazyload"></figure></p>
<p>当出现一个极端正确的样本(如图中的新增正例)时，分类器会发生巨幅改变，这显然是不合理的。</p>
<p>这说明对于分类问题，线性回归并不是一个好的学习算法。</p>
<a class="post-dummy-target" id="逻辑回归-1"></a><h3>逻辑回归</h3>
<p>既然 $y \in \lbrace 0, 1 \rbrace$，那么预测函数 $h_\theta (x) \in \lbrace 0, 1 \rbrace$</p>
<p>结论是 $h_\theta (x) = g(\theta^T x)$ ，其中 $g(z)$ 是 sigmoid 函数，至于为什么，指数家族那一节会给出解答。</p>
<p>$$
g(z) = \frac 1 {1 + e^{-z}}
$$</p>
<p><figure><img src="/svg/loading.min.svg" data-sizes="auto" data-src="https://gitee.com/DiracLee/picbed/raw/master/img/20200720133621.png" alt="" class="lazyload"></figure></p>
<p>那么 $$h_\theta (x) = \frac 1 {1 + e^{- \theta^T x}}$$</p>
<p>仍然从概率论的角度进行分析，有 
$$
\begin{cases}
\mathbf P (y = 1 | x; \theta) = h_\theta (x) \\ 
\mathbf P (y = 0 | x; \theta) = 1 - h_\theta (x)
\end{cases}
$$</p>
<p>由于 $y \in \lbrace 0, 1 \rbrace$，上式可简化为</p>
<p>$$
\mathbf P(y | x; \theta) = h_\theta (x)^y (1 - h_\theta (x))^{1 - y}
$$</p>
<p>其可能性</p>
<p>$$
\begin{aligned}
\mathscr L(\theta) 
&amp;= \mathbf P(y | x; \theta) \\ 
&amp;= \prod_{i=1}^m \mathbf P(y^{(i)} | x^{(i)}; \theta) \\ 
&amp;= \prod_{i=1}^m h_\theta (x^{(i)})^{y(i)} (1 - h_\theta (x^{(i)}))^{1 - y^{(i)}}
\end{aligned}
$$</p>
<p>对数可能性</p>
<p>$$
\begin{aligned}
l(\theta) 
&amp;= \log \mathbf P(y | x; \theta) \\ 
&amp;= \log \prod_{i=1}^m h_\theta (x^{(i)})^{y(i)} (1 - h_\theta (x^{(i)}))^{1 - y^{(i)}} \\ 
&amp;= \sum_{i=1}^m y(i) \log h_\theta (x^{(i)}) + (1 - y^{(i)}) \log (1 - h_\theta (x^{(i)}))
\end{aligned}
$$</p>
<p>目标就是选择一个合适的 $\theta$ 使得 $l(\theta) $ 最小</p>
<p>经计算</p>
<p>$$
\frac \partial {\partial \theta_j} l(\theta) =  \sum_{i=1}^m (y^{(i)} - h_\theta (x^{(i)}))x_j^{(i)}
$$</p>
<p>梯度上升算法</p>
<p>$$
\begin{aligned}
\theta_j 
&amp;:= \theta_j + \alpha \frac \partial {\partial \theta_j} l(\theta)  \\ 
&amp;= \theta_j + \alpha \sum_{i=1}^m (y^{(i)} - h_\theta (x^{(i)}))x_j^{(i)}  \\ 
&amp;= \theta_j - \alpha \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)} 
\end{aligned}
$$</p>
<p>恰好与线性回归的梯度下降算法一致。这不是巧合，实际上他们都属于广义线性回归，其适用范围及性质将会在该章节中详述。</p>
<a class="post-dummy-target" id="牛顿法"></a><h2>牛顿法</h2>
<a class="post-dummy-target" id="牛顿法-1"></a><h3>牛顿法</h3>
<p>牛顿法可以快速寻找到使得函数 $f(\theta)$ 趋于 0 的点，它的参数更新方式为</p>
<ol>
<li>作函数 $f(\theta)$ 在 $(\theta, f(\theta))$ 处的切线</li>
<li>将该切线与横坐标的截距作为 $\theta$ 的新值</li>
<li>重复以上步骤直到收敛</li>
</ol>
<p>正如下图所示</p>
<p><figure><img src="/svg/loading.min.svg" data-sizes="auto" data-src="https://gitee.com/DiracLee/picbed/raw/master/img/20200720142046.png" alt="" class="lazyload"></figure></p>
<p>其参数更新的代数形式如下</p>
<p>$$
\begin{aligned}
\theta
&amp;:= \theta - \frac {f(\theta)} {\frac \partial {\partial \theta} f(\theta)}
\end{aligned}
$$</p>
<a class="post-dummy-target" id="使用牛顿法优化mle"></a><h3>使用牛顿法优化MLE</h3>
<p>我们的目标是最大化 $l(\theta)$，而 $l(\theta)$ 是凸函数，那么目标就等价于找到使得 $\frac {\partial} {\partial \theta} l(\theta)$ 最接近 0 的点，即用 $\frac {\partial} {\partial \theta} l(\theta)$ 替换上式的 $f(\theta)$ 因此更新方式为</p>
<p>$$
\begin{aligned}
\theta 
&amp;:= \theta - \frac 1 {\frac {\partial^2} {\partial \theta^2} l(\theta)}  {\frac \partial {\partial \theta} l(\theta)}
\end{aligned}
$$</p>
<p>与经典MLE对比发现，牛顿法是学习率 $\alpha$ 为 $- \frac 1 {\frac {\partial^2} {\partial \theta^2} l(\theta)}$ 的梯度上升算法</p>
<a class="post-dummy-target" id="高维牛顿法与hessian矩阵"></a><h3>高维牛顿法与Hessian矩阵</h3>
<p>一般来讲，我们使用的 $\theta$ 是一个向量，因此更新时将 &lsquo;一阶导&rsquo; 与 &lsquo;二阶导&rsquo; 分别用 &lsquo;Nabla 算子&rsquo; 与 &lsquo;Hessian 矩阵&rsquo; 替换</p>
<p>$$
\begin{aligned}
\theta 
&amp;:= \theta - H^{-1} \nabla_\theta  l(\theta)
\end{aligned}
$$</p>
<p>Nabla 算子 
$
\nabla_\theta  = 
\begin{bmatrix}
\frac {\partial } {\partial \theta_0} \\ 
\frac {\partial } {\partial \theta_1} \\ 
\vdots \\ 
\frac {\partial } {\partial \theta_n}
\end{bmatrix}
$</p>
<p>Hessian 矩阵中第 i 行，第 j 列元素为
$
H_{ij} = \frac {\partial^2} {\partial \theta_i \partial \theta_j}
$</p>
<a class="post-dummy-target" id="牛顿法的适用性"></a><h3>牛顿法的适用性</h3>
<p>矩阵求逆运算量非常大，一般当 $n \le 50$ 时，牛顿法是一个非常值得考虑的算法</p>
<a class="post-dummy-target" id="参考"></a><h2>参考</h2>
<p><a href="https://katex.org/docs/supported.html" target="_blank">$\KaTeX$ 文档</a></p></div><div class="post-footer" id="post-footer">
    <div class="post-info">
        <div class="post-info-line">
            <div class="post-info-mod">
                <span>本文于 2020-07-20 更新</span>
            </div>
            <div class="post-info-license"></div>
        </div>
        <div class="post-info-line">
            <div class="post-info-md"></div>
            <div class="post-info-share"><span><a href="//www.linkedin.com/shareArticle?url=https%3a%2f%2fdiraclee.gitee.io%2f2020%2f07%2fcs229_lecture3%2f&amp;title=cs229%20%e7%ac%ac%e4%b8%89%e8%8a%82" target="_blank" title="分享到 LinkedIn">
            <i class="fab fa-linkedin fa-fw"></i>
        </a><a href="//service.weibo.com/share/share.php?url=https%3a%2f%2fdiraclee.gitee.io%2f2020%2f07%2fcs229_lecture3%2f&amp;appkey=&amp;title=cs229%20%e7%ac%ac%e4%b8%89%e8%8a%82" target="_blank" title="分享到 Weibo">
            <i class="fab fa-weibo fa-fw"></i>
        </a></span></div>
        </div>
    </div>

    <div class="post-info-more">
        <section><span class="tag">
                        <a href="https://diraclee.gitee.io/tags/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"><i class="fas fa-tag fa-fw"></i>&nbsp;机器学习</a>&nbsp;
                    </span><span class="tag">
                        <a href="https://diraclee.gitee.io/tags/cs229/"><i class="fas fa-tag fa-fw"></i>&nbsp;cs229</a>&nbsp;
                    </span></section>
        <section>
            <span><a href="javascript:window.history.back();">返回</a></span>&nbsp;|&nbsp;<span><a href="https://diraclee.gitee.io">主页</a></span>
        </section>
    </div>

    <div class="post-nav"><a href="https://diraclee.gitee.io/2020/07/cs229_lecture2/" class="prev" rel="prev" title="cs229 第二节 ( TA )"><i class="fas fa-angle-left fa-fw"></i>cs229 第二节 ( TA )</a>
            <a href="https://diraclee.gitee.io/2020/07/cs229_lecture4/" class="next" rel="next" title="cs229 第四节">cs229 第四节<i class="fas fa-angle-right fa-fw"></i></a></div>
</div><div class="post-comment"></div>
    </article></div>
            </main><footer class="footer">
    <div class="copyright"><div class="copyright-line">由 <a href="https://gohugo.io/" target="_blank" rel="external nofollow noopener noreffer">Hugo</a> 强力驱动 | 主题 - <a href="https://github.com/dillonzq/LoveIt" target="_blank" rel="external nofollow noopener noreffer">LoveIt<i class="far fa-heart fa-fw"></i></a>
        </div>

        <div class="copyright-line"><i class="far fa-copyright fa-fw"></i><span itemprop="copyrightYear">2020</span><span class="author" itemprop="copyrightHolder">&nbsp;<a href="https://diraclee.gitee.io" target="_blank">Dirac Lee</a></span>&nbsp;|&nbsp;<span class="license"><a rel="license external nofollow noopener noreffer" href="https://creativecommons.org/licenses/by-nc/4.0/" target="_blank">CC BY-NC 4.0</a></span></div>
    </div>
</footer></div><a href="#" class="dynamic-to-top" id="dynamic-to-top" data-scroll>
            <span>&nbsp;</span>
        </a><script src="/js/lib/jquery/jquery.slim.min.js"></script><script src="/js/lib/lazysizes/lazysizes.min.js"></script><script src="/js/lib/smooth-scroll/smooth-scroll.polyfills.min.js"></script><script>window.scroll = new SmoothScroll('[data-scroll]', {speed: 300, speedAsDuration: true});</script><link rel="stylesheet" href="/css/lib/katex/katex.min.css"><script src="/js/lib/katex/katex.min.js"></script><script defer src="/js/lib/katex/auto-render.min.js"></script><link rel="stylesheet" href="/css/lib/katex/copy-tex.min.css"><script defer src="/js/lib/katex/copy-tex.min.js"></script><script defer src="/js/lib/katex/mhchem.min.js"></script><script>
        document.addEventListener("DOMContentLoaded", function () {
            renderMathInElement(document.body, {
                delimiters: [
                    { left: "$$", right: "$$", display: true },
                    { left: "\\(", right: "\\)", display: false },
                    { left: "\\[", right: "\\]", display: true },{ left: "$", right: "$", display: false },]
            });
        });
    </script><script src="/js/blog.min.js"></script></body>
</html>